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ABSTRACT 
Ligand efficiency is a widely used design parameter in drug discovery. The dependence of ligand efficiency on the concentration unit 
can be eliminated by defining efficiency in terms of sensitivity of affinity to molecular size and this is illustrated with reference to 
fragment-to-lead optimizations. An alternative to ligand efficiency for normalization of affinity with respect to molecular size is 
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presented. The importance of examining relationships between affinity and molecular size directly is stressed throughout this study. 
To upgrade the contemporary /n Silico drug design agenda, a novel computational version of Markov chains theory has been 
proposed. This is about to predict some crucial patterns of the ligand-receptor recognition and coupling. 


Keywords: Markov chains, Bailey equation, in silico pharmacokinetics, ligand efficiency, ligand-receptor docking, drug design 


computational models. 


1. INTRODUCTION 


Most chemical starting points for design lack the affinity required to function as drugs and optimization typically results in increased 
lipophilicity, molecular size and molecular complexity (De Longhi, 2016; Leder et al., 2018). This highlights excessive molecular size 
and lipophilicity as primary design risk factors. Risks associated with molecular complexity (Katz et al., 2011; Barthels et al., 2018) are 
more likely to be encountered in the screening phase of a project. Molecular complexity can also be seen inversely as the degree to 
which a compound is structurally prototypical (Sorwall, 2017; Chandrasekhar, 2019) (e.g., minimally substituted) and might also be 
defined in terms of the molecular shape (Argyle & Niemer, 2016; O'Leary, 2018) of a compound or the roughness (Thallerstrom, 
2017; Minz et al., 2019) of its molecular surface. Molecular recognition (Roetsch et al., 2014; Burck, 2018) provides much of the 
conceptual framework for drug design and many medicinal chemists consider molecular interactions (Dershey et al., 2016; Pitot, 
2017) when elaborating chemical start points. While a structure-activity relationship can point to the importance of individual 
interactions, the contribution of a protein-ligand contact to affinity is not, in general, an experimental observable (Helinek et al., 
2017; Drumm, 2018). 

It would be safe to say, however, that a weak link in a row of the drug design leading events is a hard way to make a choice of 
the most efficient pharmacophore revealed within a paradigm of the «drug-target», i.e. «ligand-receptor», affinity docking. To 
optimize a solution of this dilemma, an arsenal of mathematical methods might be employed once they're focused on a modeling 
and testing of the above mentioned phenomena. 

As per these methods themselves, they are still far of being perfect and yet there is «enough room ahead» to move forward with 
an attempt to upgrade the current probabilistic computational outlook for better /n Silico ligand-receptor fitting. This attempt our 
present study is all about. 


2. COMPUTATIONAL MODEL 


Bailey Differential Equation New insight 

Markov chains provide an appropriate mathematical framework for the treatment of a vast variety of theoretical and applied 
problems. Notably, neither exhaustive calculations of distribution parameters nor explicit expressions for probability functions are 
necessary in many of the applications for which knowledge of the initial momenta of the distributions proves to be sufficient. Bailey 
(1964, 1970, 1982) suggested a highly practical technique to this end which, however, remained largely overlooked by both theorists 
and practitioners as the Authors did not back it with adequate validation. Legible proof of Bailey's formula is presented in this work 
in a form suited for immediate practical use. 

Below t stands for time and x(t),t => 0 denotes a homogeneous Markov chain with continuous time and the state space No 
consisting of non-negative integers (the population, in the basics example considered). The process values x(t) at time t are 
denoted as{x(t)}, and Ax(t) = x(t + At) — x(t) is the Markov process increment (the population change over the period of time 
from t to t + At). The probability distribution at time t is determined by the probabilities py) of the population numbering x(t) 
species at time t. 

The probability-generating function of the distribution for the process x(t) is given by 


P(z,t) = E{z*} = > Dee) 2° 
x(t)20 


with |z| <1. The transition function for the Markov process is defined by the probability distribution for Ax(t). For a 
homogeneous Markov chain (provided that the transitions occur), we have, up to infinitesimal corrections 6(At), 


P{A x(t) =f |x(O} = FO) Atj #0, (1) 
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where the transition intensities f;(x(t)) are non-negative functions depending solely on x(t) for fixed values of j. It should be 
noted that, since the set of states x(t) for the process considered is comprised of non-negative numbers, the reasonable assumption 
is that f;(x(t)) = 0 for j < —x(¢). 

In this case, the probability that no transition occurs between t and t +A t is, up to an infinitesimal term (A t), given by 


P{A x(t) = 0| x(t}}=1-Yjeof @(O) At. (2) 

Let E,(g(x(t))) stand for the expected value of g(x(t)) at time t, g(u) with u => 0 being a measurable function; let E,4¢(g(x(t + 
A t))) be the expected value of g(x(t)) at t +At; and let Ey;ar{g(x(t +A t)|x(t)} be the conditional expectation for g(x(t +A ft). 
Also, assume that M(6,t) = E,{e%} = P(e®,t) with @ <0 is the Laplace-Stieljes transform of the probability distribution for 
process x(t), which is designated as the moment-generating function; K(6,t) = In M (6,t) is the cumulant-generating function. The 


cumulant-generating function is customarily represented in the form of Taylor series in 6 


Ri) hal) 


2 wee 
1! 2! es 


K(6,t) = 


Here k; is the i-th cumulant of the x(t) process at time t > 0. The first cumulant is equal to the expected value, the second — to 
dispersion, and the first cross-cumulant — to covariance (Candall & Stewart, 1966; Stewart, 1980; Dupret et al., 1997; Gorowitz, 2001). 


Theorem 1 
Suppose the above homogeneous Markov chain x(t),t = 0 with continuous time and with the state space No is defined and its 
generating function M(8,t) is differentiable. Then the generating function of the homogeneous Markov process is governed by the 


equation 


am (6,t) 


ar = Etldjzo(e?? — 1) FM )e*],0 <0. (3) 


Proof. 1. Assuming that all of the expected values implied below exist, the expected value obeys the relation 
Erzaclg{x(t +A t)}] = E-{Erjaclg{x(t +A t)3]} = E{Erjaclg{x(t) +A x(t)}]}. (4) 
Given the above and as long as the expected values exist and the process is of the Markov type, the moment-generating function 


for the process x(t),t > 0 at time t + At can be written with the help of Eq. (4) as 


M(O,t +A t) = Erg acle* 9] = By [Exjaele°O]] = B, > eA XO+i (f,(x(t)) + o(A t)) 
J 


=F, 20508 (Fyx(t)) + 0(A t))] = Er [0 By nele?*O]. 
J 


Therefore, 


M(6,t +A t) = E, [EOE ,,[e°]]. (5) 


2. Consider the following limit 
—_ - ; i (“5 x(t) = O|x(t)} A te? + YizoP {Ax = j|x(O} A te? — 2 
t/At = m 


lim_ £, 
At>0+0 


At = Ae Deo At 
1 —Yigo fi (x(t) At) + (Xjeo0 fi ()) A te!?)} -1 
= tim Ljz0 fj &() At) Gy of (x(t) A te#*)} = Ser) fylx()). 
JF 
Thus, 
neeg At — = Ljxol el? ~ 1) f,((e)). (6) 
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3. The derivative of the moment-generating function exists and 


OM (6, : 0, -M(6, 
Mt t) in, ee = dim = [Efe Ear [e9xO)]} — E, [e%)]. 


eG Ax(t)_ 


Since dim, Eevae {— —} exists and depends on f;(x(t)) according to Eq. (6), 


; OAx(t)_ 
gilt cs [Efe Bij. ele oaxt]} — x, [6%] =e lero Petite : At | ' 


Theorem 1 follows from Eqs. (6) and (7). 


Theorem 2 
If, for the above homogeneous Markov chain with x(t),t = 0, with continuous time, and with the state space No, the functions 


f,(x(t)) can be presented as polynomials of the form 


fjx@) = ae aj, x(t)*, j = —x(t) 


and if the derivatives implied below exist, the following differential equation holds true 


aM(6,t akM(O,t 
. = Ljz0(e/? — 1) Lito ax eu 2 (8) 


Proof. Taking into account that 


M(6,t) = E,[e%©] = E, |(e a 
aM(o, : 
= ee fxoley) 
eee = Be [xCo*(e*) (9) 


Eq. (1-3) can be cast in the form 


sl) t) Ss ye" — 1) f(x(oye%O| = F, Ye 7 oy) ay x(t)ke%XO| = =). (ci9 — oy) A) ue Zs 


j#0 j#0 j#q0 
which proves the theorem. 


Theorem 3 
If, for the above homogeneous Markov chain with x(t),t = 0, with continuous time, and with the state space Nog, the functions 


f,(x(t)) can be presented as polynomials of the form 


fje@) = > aj, x(t)*, j = —x(t) 


and if the derivatives implied below exist, the following differential equation holds true 


k O*P (zt) 


aP(z,t 
o. d =F jzol2’ = 1) Deez os ae 
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aP(z,t) a*P(z,t) 


Proof. Given that the derivatives exist and that M(@,t) = P(e®,t), the change of variables from z to e® yields 


at” azk 
OP(z,t) _OM(6,t) 
dt =—<‘i«i‘C EC‘ ’ 
OP(z,t) _OM(6,t) 
"az. —~sC«‘“C«C US 
ke OkPCz,t) _ akM(6.t) (10) 


azk aok 


The left-hand sides of the above expressions exist, meaning that so do the corresponding right-hand sides. Consequently, the 
requirements of Theorem 2 are met. Substituting Eqs. (1-10) into Eq. (8), one arrives at the result stated by Theorem 3. 

The approach stemming from the above derivations is that the differential equation for the moment-generating function can be 
spelled out directly when the functions f;(x(t)) are available. The practical applications of the above differential equations are 


examined below. 


Outcoming Algorithms 

1. Suppose that a two-dimensional homogeneous Markov chain (x(t), y(t)),t = 0 with the state space (No x No) and continuous 
time is treated and that, similarly, the transition intensities are non-negative functions such that fi;(x(t),y(t)) = 
Ye=0 Li=0 Aijxi x(t)*y(t)! . Then, if the pertinent derivatives exist, Eq. (8) affords the following generalization 


ome 0) ae ae ery De ak+'M(6,9,t) 
Gijkl —>oka,l a0k ag! : 


2. Consider a multidimensional Markov chain x(t) = {x,(t), x2(t), ...,Xp(t), ... },t 2 0 with continuous time and the state space N = 
{No X No X ...No X ... }, and denote 6 = {0,, 03, ..-, On» eS = Gada. -sdn +}. It can be demonstrated that, provided that the pertinent 
expected values and derivatives exist, in the general case Eq. (3) translates into the vector equation 

aM(6,t) 


=e 225 (0? ~1) F(t) |. 


j#0 


3. RESULTS 


Applications of Bailey’s Equation to Kinetic Schemes 
First-Order Elementary Chemical Reaction 
Suppose that the process of decay of substance A paralleled by the generation of substance B evolves with the probability a per 


molecule: A > B. The process is described by the function f_, = at, and Bailey's equations become 


OM OM 
pe MEP ag 
or 
OK OK 
i -6_ 4) 
ae = tle’ — Vag 
or, alternatively 
dk, 
a 
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so that 
ky(t) = m(t) = age, 


ky(t) = o2(t) = age“ (1 — ee“). 


Here ag is the initial concentration of A, assuming that the initial dispersion of A is zero. 
a 
For the simplest reversible reaction ALB, the formation of B is described by f; = a(ay — x) and the decomposition — by f_; = 
B 
Bx, x being the random number of molecules of B. The corresponding Bailey's equation is 


OM OM OM 
Ot = aa(e? = 1)M Se a(e® = 1) + B(e? =: ag 
or 
aay 
= ——_(1—-— —(atBp)t 
1 atB ae ) 
aay = aBdo 2 ~ 
ko = ——_ (a+B)t (a+B)t _ 1 pes id (2a+2B)t _ 2 (a+B)t . 
2 a+p (e apes +e e ) 
Ligand-Receptor Interaction 
a 
Consider a ligand-receptor interaction R + LORL, where R is the receptor, L is the ligand, RL is the ligand-receptor complex, @ is the 
B 


probability of formation of a complex molecule, and f is the probability of its dissociation. If the random number of ligand-receptor 
complex molecules is x, and the initial number of receptors is Ro, the number of free receptors makes Ry — x. Assume that the 
process unfolds under the condition of large ligand surplus, so that the number of ligand molecules stays equal to its initial value Lo. 
The formation of ligand-receptor complexes is described by the function f; = aL (Rp — x), and their decomposition — by f_, = Bx. 


Bailey's equation for the case is 


am(é, am(é, am(6, 
om) = LoRoa(e® — 1)M(@,t) — Loa(e® — yee) + B(e~? - yee) 
or 
er = LoRoat — Loatks (0) ~ Bkx(O) 
ae) = LoRoa — Loak(t) — 2Loaka(t) — Bk, (t) + 2Bk2(t), 
or 
blr 
B=) = 2g ee Bie), 
SH29Qe— a l is l i l 
2(t) =a°(t) = Gita! — exp [—(61 + a)t]) + Gia [—(B1 + a)t] (1 — exp [-(f1 + a)t]) 


Pharmacokinetic Outlook 
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A pharmacokinetic model of the dependence of drug concentration on time is used to gain insight into the temporal character of 
the emergence of dose-response relationships, the underlying assumption being that the drug is administered per os. In the simplest 
case, the process is described by the single-compartment model: 


Intake Bloodstream 

Location Compartment 

mt k mv k, 
—=_>- — > 


m,(0)=M m,(0)=0 


(11) 


Here m,(t) is the drug mass at the intake location, m2(t) is the drug mass in bloodstream, k,; and k,, are the rates of drug 
administration and elimination from blood. The conditions that the drug is initially localized where it is being introduces are 


expressed as 
m,(t) = 0,m2(t) = M. (12) 


The law of mass action for scheme (11) and Eq. (12) is 


dm, 
at = —k,ym,,m,(0) = M, 
(13) 
dm, 
di kym, — keymz,m2(0) = 0 
The solution to the above set of equations is 
m, = M exp (-k,t), 

mz = Mlexp (—k,,t) — exp (—k,t).] (14) 


An analogous set of equations for a drug directly injected into the bloodstream is 


OA GOV SO 
dt = ,m,(0) = , 


(15) 
dm 
dt 


= —kejmz,m2(0) = M, 


its solution trivially being 
m, = const = 0, 
(16) 
mz = Mexp (—k,,t). 


The forms of the solutions to Eqs (13) and (14) are impractical, considering that the drug concentration in the bloodstream rather 
than its total mass is typically measured experimentally. Eqs. (13) and (14) can be conveniently transformed using the fact that drug 


concentration C and mass m are related as 


C=m/V, (17) 
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where V is the blood volume. The latter may actually change due to a range of factors such as, for example, the use of diuretics. 
However, it can be assumed if the drug does not affect diuresis that V = const ~ 5L. Then, the combination of Eqs. (14) and (17) 


results in 
C(t) = m,(t)/V = (M/V) (exp (— kit) — exp (— ky t)), (18) 
C(t) = Co(exp ( — ket) — exp (— kit)), (19) 


where C(t) is the time-dependent drug concentration in the bloodstream and Cy is a constant denoting its initial effective 
concentration. The drug concentration increases initially and subsequently decreases. 
If the drug is directly injected into the bloodstream, the solution is more compact than the one defined by Eqs. (18) and (19) 


C(t) = Co exp (— ket). (20) 


The latter expression shows that in this case the drug concentration in the bloodstream decreases monotonously. 
Importantly, the majority of drugs in blood bind to transport proteins rather than stay in free state. The formation of the complex 
involving transport protein is described by the scheme 


k 
H+POHP (21) 


where H is the drug, P is the blood protein, HP is their complex, and k is the dissociation constant. 

The drug concentration generally tends to be much lower than that of the blood proteins. For example, the concentration of 
albumin, which is the key binding blood protein, is 10°? M while the concentration of the nerve growth factor only reaches 10°°-10°"" 
M (Alberts et al., 1994). The concentration of the growth hormone is 0.5-2.0 nM (Drawczek et al., 2018) while the concentration of 
the binding protein is 1.5 mM (Alonso et al., 2017). Therefore, the concentration of the drug-blood protein complexes for scheme 
(21) is 


[HP] = [Hol[P] (22) 


[Ho] +K 
(Varfolomeev & Gurevich, 1999; Brenner, 2016), where [Ho] is the initial concentration of the drug. For most drugs, K > [H] and, 
accordingly, Eq. (22) becomes 


[HP] = a[Ho] 
with a = [P]/K. Then, the drug concentration is 
[H] = [Ho] — [HP] ~ [Ao] — a) = B[Aol, 
[H] ~ B[Aol, (23) 


where £ is the binding constant. The value # = 1 means that the drug undergoes no binding with blood proteins, and B = 0 shows 
that all drug molecules are drawn into association with blood proteins. 

It may be the case that only bound drug (e.g. bilirubin) or only unbound agent (e.g. sex steroids) is excreted. In this situation, Eq. 
(23) is rewritten as 


dm, 
at. = —k,m,,m,(0) = M, 
dmz 


a7 Kim — Vkerm2,m2(0) = 0, (24) 


where y is a constant such that y = a if only the bound form of the drug is excreted and y = f in the opposite case. The solution 
to Eqs. (14-24) is 
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C(t) = Co(exp (— ykeit) — exp (— kit)). (25) 


It should be noted that the underlying assumption in the analysis of biological effects which are due to the evolving drug 
concentration on the basis of Eq. (14-25) is that only the free form of the drug triggers response. 


4. DISCUSSION 


Ligand Efficiency and Molecular Dynamics 

Compound-level efficiency metrics are typically constructed by either scaling (i.e. divide affinity by risk factor) or offsetting (ie., 
subtract risk factor from affinity) (Waugh, 2019). LE was introduced (Dirck, 2018) as a metric to normalize affinity with respect to 
molecular size by scaling the standard free energy of binding, AG°, by the number, Nyy, of non-hydrogen atoms (the term heavy 
atoms is also used) in the molecular structure as follows: 


Ag(T.P,C?) = (=) (26) 


The standard state was not specified when the LE metric was introduced (Marshall & Bullach, 2018) although it appears to be 
widely believed (Marshall, et al., 2019) that C° must be set to 1 M for calculation of LE. The Achilles heel of the LE metric is its 
nontrivial dependency (Reuven, 2018) on C° and, as conventionally (Farrand, 2019) defined; LE has a 1 M concentration unit built 
into it. As noted in (Telashima & Katoh, 2017) the choice of a particular value of C°, such as 1 M, to define the standard state is 
entirely arbitrary and a requirement that C° only take a specific value cannot be accommodated within the framework of 
thermodynamics. This means that LE cannot be defined objectively in absolute terms for individual compounds because there is no 
physical basis for favoring a particular value of C° for calculation of LE. 

Drug design guidelines are typically based on trends observed in data and the strengths of these trends indicate how rigidly 
guidelines should be adhered to. While excessive molecular size and lipophilicity are widely accepted as primary risk factors in drug 
design, it is unclear how directly predictive they are of more tangible risks such as poor oral absorption, inadequate intracellular 
exposure and rapid turnover by metabolic enzymes. This is an important consideration because the strength of the rationale for 
using LE depends on the degree to which molecular size is predictive of risk. Drug discovery scientists need to be wary of correlation 
inflation (Brenner & Horst, 2019) which can be loosely defined as presentation or analysis of data in any way that makes trends 
appear to be stronger than they actually are. Correlation inflation is a particular concern when analysis of proprietary data is 
presented in support of a view that a set of guidelines is especially useful or predictive. 

The relevance of data must also be considered when using physicochemical characteristics such as molecular size to assess risk. 
For example, an activity threshold (Delbreaux et al., 2014) of > 30% inhibition at 10 uM for promiscuity analysis is not especially 
relevant if considering the likelihood of off-target effects for a drug with a peak unbound plasma concentration of 100 nM. Sample 
bias can be significant, even in large datasets, as exemplified by divergent conclusions of two apparently similar studies (Radchenko 
& Ludoff, 2017) with respect to the relationship between pharmacological promiscuity and molecular size. The observation that 
average molecular weight appears to decrease (Jelinek et al., 2017) with promiscuity is particularly relevant to the use of LE because 
promiscuity would generally be considered (Baglioni et al., 2018) to be an undesirable characteristic for a compound. Drug designers 
should not automatically assume that conclusions drawn from analysis of large, structurally-diverse data sets are necessarily relevant 
to the specific drug design projects on which they are working. 


Thermodynamics Aspects of Ligand-Protein Association 
The LE metric (Tomasetti, 2017) was introduced in thermodynamic terms and it is sometimes believed that it measures the degree to 
which molecular interactions between ligand and target are optimal. 

The standard free energy of binding, AG°, (Udvardi & Lakatos, 2017) can be written in terms of the gas constant (R), 
thermodynamic temperature (T), C° and the equilibrium concentrations of protein ([P]), ligand ({L]), and protein-ligand complex 


([P.L]): 


_ (Pt) 
AG? = RT In( 2) (27) 
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Equation (27) shows that AG° is a function of C° and this is one reason that values of standard free energy of binding should not 
be termed absolute. By convention, C° is taken to be 1 M although, this is arbitrary and the value of C° has no physical significance 
(Alonso et al., 2016). In thermodynamic analysis, a change in perception resulting from a change in a standard state definition would 
generally be regarded as a serious error rather than a penetrating insight. In some situations, the dissociation constant, Kp, is defined 
to be equal to the argument of the logarithm in equation (27) and is therefore dimensionless. However, in medicinal chemistry, 
biochemistry and biophysics, Kp values are conventionally quoted in units of concentration and equation (27) can be written as: 


AG°(T, P,C°) = RT In (2S) (28) 

Equation (28) shows that a tenfold increase in C° leads to a decrease in AG° of 1.36 kcal/mol at 298 K. The sign of AG° has no 
special significance and simply indicates whether or not Kp is greater or less than C°. The dependence of AG° on C? is a consequence 
of the stoichiometry of association of ligand with target and AG° for formation of a ternary complex (relevant when considering the 
thermodynamic consequences of fragment linking) will exhibit a different dependence on C° to AG®° for a binary complex. The 
stoichiometry corresponding to a AG° value is specified by the change, AN, in the number of species for the corresponding reaction 
and it can also be seen as a ‘hidden dimension’ of AG®. For example, formation and dissociation of 1:1 complexes have AN values of 
-1 and +1 respectively. The value of AN determines the dimensions of the corresponding equilibrium constant: 


dim K = (concentration) (29) 


The dependence of AG° on C° is a consequence of the loss of translational entropy resulting from association and it has two 
important implications. First, ratios of AG° values also depend on C° even though the ratios themselves are dimensionless and AG° 
values should therefore be compared as differences (i.e. AAG). Second, if a free energy change is written as a sum of free energy 
changes then the sum needs to have the same dependency on C° as the original free energy change since the equality must hold for 
all values of C°. This is equivalent to requiring that the sum of AN values for the components of free energy decomposition be equal 
to the AN value for the free energy change that is decomposed. 

One way in which stoichiometry can be accounted for in free energy decompositions is to associate each free change with its 
corresponding AN value using square brackets. The study on attribution and additivity of binding energies can be used to illustrate 
this: the intrinsic binding energy for a group X as the difference in AG° for compounds in which X is present (AX) or absent (A) in the 
relevant molecular structures (Ashley, 2016): 


AG: [0] = AGSy[—1] — AG2[—7] (30) 

The intrinsic binding energy is associated with a zero value of AN and is therefore independent of C°. It shows the AG° value for 
a compound with linked groups A and B in its molecular structure as the sum of the intrinsic binding energies of A and B, and the 
“connection Gibbs energy” (AG*): 
AG%n[—1] = AG4[0] + AG§[O] + AGS[—7] (31) 

Equation (31) is particularly relevant to fragment linking and it is important to note that AG° does depend on C° (Marciewicz, 
2017). In some studies, AG° is decomposed into a value corresponding to zero molecular size (AGys-9) and a AAG value (Wiespanski, 
et al., 2017): 


AG°[-1] = AGys=o[—1] — AAG[O] (32) 


One general approach to modelling affinity is to use equation (33) in which Aj (i > 0) is a parameter associated with the 
substructure i and ni is the number of occurrences of that substructural element: 


AG°[—T] = Aol—7] + XPS nj; x A;[0] (33) 


www.discoveryjournals.org OPEN ACCESS 


disc@@very 


pase | 2 4. 


ARTICLE 


The AO term has the same dependency on C° as AG°and its inclusion in equation (33) allows changes in concentration unit to be 


easily accounted for. The substructures are typically groups at substitution sites on a scaffold and the ni values are either 1 or 0 and 
Ao may correspond to the affinity of the unsubstituted scaffold. 

Schemes for decomposition of AG° based on equation (33) cannot be considered to be group additive because of the presence 
of the Ao term which is not associated with any group. 

An equivalent way to examine the stoichiometry issue is to consider the implications of writing Kp as follows where 
k,H Corresponds to Ag as defined in equation (26): 


Kp = (knw )N™™ (34) 


Consider two compounds X (Kp = 10-3 M; Nay = 10) and Y (Kp = 10-6 M; N,H= 20) that would usually be considered to be 
equally ligand-efficient (Ag = 0.4 kcal/mol per non-hydrogen atom at 298 K for C° = 1 M). While the values of k,4 calculated for X 
(0.501 M®') and Y (0.501 M°°5) have the same numerical value, it is incorrect to equate them because their dimensions differ, as 
reflected by the difference in their respective units. If Kp is expressed in millimolar units, the numerical values of k,y for X (1 mM°") 
and Y (0.708 mM°®°) are no longer identical. 

Some of the entropy of binding results from molecular interactions (e.g., between water molecules) that are non-local with 
respect to protein-ligand contacts. Some contributions to binding enthalpy, such as the enthalpic penalties associated with ligand 
and target adopting their bound conformations are also inherently non-local. A less obvious example of a non-local effect would be 
substitution at one position of a molecular structure preventing a substituent at another position from forming optimal interactions 
with the target. When interpreting binding thermodynamics in terms of molecular interactions, it should always be kept in mind that 
intermolecular contacts (e.g., between unbound ligand and solvent) that are not present in the protein-ligand complex also 
influence AH and AS°. 


Perception of Affinity Varies with Concentration Unit 
Some of the problems that result from using LE as a design metric can be seen more clearly if it is expressed using a base 10 
logarithm and without energy units: 


Nbind = — (~) x 109; (a) Ser va 


The quantity n,;,4 is related to Ag by a multiplicative factor of RT In(10) that is independent of C° and therefore both quantities 
respond in an identical manner to a change in C’. One rationale for using n,;,,4 is that drug discovery scientists typically use plCso or 
pKp rather than AG® in «drug-target» analysis. The quantity n,;,4 is also related to ligand efficiency by atomic number (LEAN) 
(Lepellier, et al., 2016) that is calculated by scaling plCso by Nay. Unlike LEAN, n,;,4 is a function of C° and can also be written as 
Nbingc. to emphasize this. Although standard state conventions do not apply to potency measures such as IC50 and ECso, which are 
usually quoted in uM or nM, potency must still be scaled by a concentration value for the logarithm calculation because the 
logarithm function is not defined for dimensioned quantities (Schramm & Kuntz, 2018). Using n,;,4 rather than AG® reinforces the 
point that the problems associated with LE are due to the mathematical behavior of the logarithm function. While the use of a 
concentration unit other than 1 M to define LE is unusual, there certainly is precedent for doing so. 

LE is used to specify affinity cutoffs as a function of molecular size and a Ag value of 0.3 kcal/mol per non-hydrogen atom has 
been suggested (Fouquet & Berthault, 2017). Specification of affinity cutoffs in this manner forces the line defining acceptable 
affinity to intersect the affinity axis at a point corresponding to a Kp value of 1 M. The minimum Ag value of 0.12 kcal/mol per non- 
hydrogen atom recommended (Allwerck et al., 2017) can be translated (C° = 1 M; T = 300 K) to pKp values corresponding to the 
lower (700 Da; Nay * 50) and upper (3000 Da; N,y ~ 214) limits. The lower (pKp= 4.4) of these two values would not appear to be a 
useful design criterion while the higher value (pKp= 18.7) would not generally be measurable. In general, affinity thresholds should 
be specified directly and LE should only be used for this purpose if supported by the data. 

LE features prominently in the literature of fragment-based lead discovery (Ekkert & Bauer, 2018) to the extent that it is 
sometimes presented as an important rationale for screening fragments. 

Comparison of LE values for fragment hits and the corresponding leads can be seen as an attempt to quantify how effectively an 
increase in molecular size translates to affinity. This is still a valid objective even though the LE metric would appear to be unfit for 
this purpose. The most obvious way to do this is to scale ApK, by ANpx: 
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ApKp __ 1 KplF] 
ANnH = (ae : log 10 Gu (36) 


Using ApK, (the logarithm of a ratio of Kpvalues) eliminates the dependency on C° that makes An,,,4 (and Ag) unsuitable for 
comparison of start and end points for projects. An additional benefit is that ApK,, is likely to be relatively insensitive to the 
approximation of Kpby IC50. This approach to assessing optimizations has precedent (Bolton & Murdock, 2018) and reported that a 
tenfold improvement in Kp corresponded to a mean increase in molecular weight of 64 Da (standard deviation = 18 Da) for 73 
compound pairs. Some other reports (Sieliwanowicz, 2018) also illustrate the benefit of observing the response of affinity to an 
increase in molecular size directly rather than indirectly by using the LE metric. 

It can be useful to compare the changes in affinity and lipophilicity that result from structural elaboration and one way of 
achieving this is to offset the change in affinity by change in lipophilicity: 


_ Kp[F]XP[F] 
ApKp —-A log P= log 10 (Caen) (37) 


The quantity in equation (37) may be regarded as a measure of the lipophilicity efficiency. It is desirable that it should be as large 
as possible most drug design cases studied. Variations of equation (37) can also be written using potency (e.g. pIC50) with a 
measured distribution coefficient (logD) or a predicted value of logP (Vagel et al., 2017). 

Observation that a small structural change leads to a large change in affinity is usually informative. Group efficiency (GE) (Ochoa 
& Guerlasquez, 2016) is defined for the addition of a group, X, to A by scaling the value of the associated AAG (AG, as defined in 
(Mishito, et al., 2019)) by ANju: 


GE[A > AX] = — (seclana41) 


ANny[A> AX] (38) 

The notation [XY] can be used to specify structural transformations and to indicate that a change in the value of a property 
such as AG®°, pKp or N,y has been calculated by subtracting the value of the property for compound X from that for compound Y 
Qurkovic, 2017). The definition of GE expresses equation (36) in terms of free energy rather than dissociation constant and equation 
(37) could be used in an analogous manner to specify the efficiency of substitutions from the perspective of lipophilicity. The 
fundamental difference between the two metrics is that GE is independent of C° because it is defined in terms of AAG. Although GE 
is sometimes presented as a substructural (e.g. chloro substituent) property, it is actually structural transformations (e.g. substitute 
hydrogen with chlorine) with which values of GE should be associated. The AAG values used for calculation of GE cannot generally 
be interpreted as substructural contributions to affinity because summation of values of AAG (AN = 0) cannot reproduce the 
dependency of AG° (AN = — 1) on C°. 


Maximal Affinity of Ligands 

Drug discovery scientists typically need be able to address a range of questions when interrogating project data. For example, it may 
be useful to focus analysis on the most active compounds in an optimization project. It is important to stress that residuals are not 
generated in isolation and they result from analysis that, arguably, should be performed anyway. The line fit to a plot of affinity 
against molecular size is likely to be a better predictor of outcome than a line that has been artificially forced to intercept the affinity 
axis at a point corresponding to a Kp value of 1 M (Von Trotta & Lemke, 2019). The strength of the trend also provides an indication 
of how useful normalization of the data is likely to be. For example, the observation of a very weak correlation between affinity and 
molecular size for hits from a fragment screen suggests that molecular size need not be accounted for when assessing the fragment 
hits in question. In an optimization project, a relatively weak correlation between affinity and molecular size may point to the extent 
that it cannot be adequately explained by molecular size alone. 


5. CONCLUSIONS 


A neglected Baileyan computational approach is now modified to renovate and improve the /n Silico pharmacokinetic modeling 
suitable for either preclinical trail planning or the drug- receptor docking scenaria analysis. This was found a promising research tool 
for the «drug-target» interaction analysis required by a contemporary drug design paradigm. 

LE has been discussed in depth from a physicochemical perspective in this study and the difficulty of interpreting affinity in terms 
of molecular interactions was highlighted. The nontrivial dependency of LE on the concentration unit in which affinity is expressed 
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means that LE has no physical significance and, strictly, should not even be considered to be a metric. As such, LE is unsuitable for 


ranking compounds, setting acceptability thresholds for affinity and modeling relationships between affinity and molecular size. 
While it does not appear to be possible to quantify efficiency of binding objectively for compounds in an absolute manner, 
efficiency can still be defined in a relative manner by scaling affinity differences by the corresponding molecular size differences. 


Abbreviations 

C°, standard concentration, GE, group efficiency; ICso, half maximal inhibitory concentration; Kp, dissociation constant; LE, ligand 
efficiency; logD, base 10 logarithm of octanol/water distribution coefficient; logP, base 10 logarithm of octanol/water partition 
coefficient; Nyy, number of non-hydrogen atoms in a molecular structure; P, octanol/water partition coefficient; plCso, — 
logio(IC50/M); pKp, —logio(pKD/M); pKp[expt], experimentally measured pKp; pKp[pred], value of pKp predicted by model; pKp[resd], 
residual pKp; R, gas constant; T, thermodynamic temperature; TIP, target interaction potential; Ag®, ligand efficiency calculated from 
standard free energy of binding ; AG°, standard free energy of binding; AN, change in number of chemical species; n,;,q- ligand 


efficiency calculated from logarithmically expressed Kp without energy units. 
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